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Description 

FIELD OF THE INVENTION 

[0001] The present invention relates to isolated polynucleotides that represent a complete gene, or a fragment there- 
of, that is expressed. In addition, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these poiynucleotides. The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present invention also relates to isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 
and proteins. 

DESCRIPTION OF THE RELATED ART 

[0002] Efforts to map and sequence the genome of a number of organisms are in progress; a few complete genome 
sequences, for example those of £ coll and Saccharomyces cerevlsiae are known (Blattner et a!., Science 277: 7453 
(1997); Goffeau et al., Science 274:546 (1996)). The complete genome of a multicellular organism, C. elegans, has 
also been sequenced (See, the C. elegans Sequencing Consortium, Science 282:201 2 (199B)). To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

SUMMARY OF THE INVENTION 

[0003] The present invention comprises polynucleotides, such as complete cDNA sequences and/or sequences of 
genomic DNA encompassing complete genes, fragments of genes, and/or regulatory elements of genes and/or regions 
with other functions and/or intergenic regions, hereinafter collectively referred to as Sequence-Determined DNA Frag- 
ments (SDFs), from different plant species, particularly corn, wheat, soybean, rice and Arabidopsls thatiana, and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom. In 
some instances, the SDFs span the entirety of a protein-coding segment. In some instances, the entirety of an mRNA 
is represented. Other objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as, but not limited to, promoters. Complements of any sequence of the invention are also considered part of the 
invention. 

[0004] Other objects of the invention are polynucleotides comprising exon sequences, polynucleotides comprising 
intron sequences, polynucleotides comprising introns together with exons, intron/exon junction sequences, 5' untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 
desirable amino acid sequence are within the scope of the invention. 

[0005] The present invention also resides in probes useful for isolating and identifying nucleic acids that hybridize 
to an SDF of the invention. The probes can be of any length, but more typically are 12-2000 nucleotides in length; 
more typically, 15 to 200 nucleotides long; even more typically, 1 8 to 100 nucleotides long. 

[0006] Yet another object of the invention is a method of isolating and/or identifying nucleic acids using the following 
steps: 

(a) contacting a probe of the instant invention with a polynucleotide sample under conditions that permit hybridi- 
zation and formation of a polynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

[0007] The conditions for hybridization can be from low to moderate to high stringency conditions. The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, 
for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired fragment of ge- 
nomic DNA. 

[0008] Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species. Probes and methods of the invention can further be used to detect or isolate related genes in other plant 
species using genomic DNA (gDNA) and/or cDNA libraries. In some instances, especially when longer probes and low 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and/or gDNA 
sequences of a plant. This approach is useful for isolating representatives of gene families which are identifiable by 
possession of a common functional domain in the gene product or which have common cis-acting regulatory sequences. 
This approach is also useful for identifying orthologous genes from other organisms. 

[0009] The present invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF. The constructs comprise all or a fragment of the expressed SDF, or of a complementary 
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sequence. Examples of constructs include ribozymes comprising RNA encoded by an SDF or by a sequence comple- 
mentary thereto, antisense constructs, constructs comprising coding regions or parts thereof, constructs comprising 
promoters, introns, untranslated regions, scaffold attachment regions, methylating regions, enhancing or reducing re- 
gions, DNA and chromatin conformation modifying sequences, etc. Such constructs can be constructed using viral, 

5 plasmid, bacterial artificial chromosomes (BACs), plasmid artificial chromosomes (PACs), autonomous plant plasmids, 
plant artificial chromosomes or other types of vectors and exist in the plant as autonomous replicating sequences or 
as DNA integrated into the genome. When inserted into a host cell the construct is, preferably, functionally integrated 
with, or operatively linked to, a heterologous polynucleotide. For instance, a coding region from an SDF might be 
operably linked to a promoter that is functional in a plant. 

10 [0010] The present invention also resides in host cells, including bacterial or yeast cells or plant cells, and plants 
that harbor constructs such as described above. Another aspect of the invention relates to methods for modulating 
expression of specific genes in plants by expression of the coding sequence of the constructs, by regulation of expres- 
sion of one or more endogenous genes in a plant or by suppression of expression of the polynucleotides of the invention 
in a plant. Methods of modulation of gene expression include without limitation (1) inserting into a host cell additional 

is copies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous promoter in a host cell; (3) 
inserting antisense or ribozyme constructs into a host cell and (4) inserting into a host cell a polynucleotide comprising 
a sequence encoding a variant, fragment, or fusion of the native polypeptides of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

20 

[0011] The sequences of exemplary SDFs and polypeptides corresponding to the coding sequences of the instant 
invention are described in Reference Tables 1 and 2, REF Tables 1 and 2"; and in Sequence Tables 1 and 2, SEQ 
Tables 1 and 2." The REF Tables refer to a number of Maximum Length Sequences" or MLS." Each MLS corresponds 
to the longest cDNA obtained, either by cloning or by the prediction from genomic sequence. The sequence of the 
25 MLS is the cDNA sequence as described in the Av subsection of the REF Tables. 
[0012] The REF Table includes the following information relating to each MLS: 

I. cDNA Sequence 

30 A. 5' UTR 

B. Coding Sequence 

C. 3' UTR 

IL Genomic Sequence 

35 

A. Exons 

B. Introns 

C. Promoters 

40 in. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A. Signal Peptide 
45 B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
so I, cDNA SEQUENCE 

[001 3] The REF Tables indicate which sequence in the SEQ Tables represents the sequence of each MLS. The MLS 
sequence can comprise 5' and 3' UTR as well as coding sequences. In addition, specific cDNA clone numbers also 
are included in the REF Tables when the MLS sequence relates to a specific cDNA clone. 

55 

A. 5' UTR 

[0014] The location of the 5' UTR can be determined by comparing the most 5' MLS sequence with the corresponding 



6 



EP 1 033 405 A2 

I^JIThTh 35 ^L Ca, , 8d in th ? REF Tab ' eS - The S6qUenCe th3t matChes ' be 9 innin 9 at a "V of the transcriptional 
start srtes and ending at the last nucleotide before any of the translational start sites corresponds to the 5' UTR. 

B. Coding Region 

[0015] The coding region is the sequence in any open reading frame found in the MLS. Coding regions of interest 
are indicated in the Poly P SEQ subsection of the REF Tables. 

C. 3' UTR 

SJUL ™'.° Cati0n ° f ^ 3 ! U J R be de,ermined *y comparing the most 3" MLS sequence with the corresponding 
genome sequence as md.cated ,n the REF Tables. The sequence that matches, beginning at the translational stop 
site and ending at the last nucleotide of the MLS corresponds to the 3' UTR. e iransianonai stop 

15 II. GENOMIC SEQUENCE 

?Z!ZL ^^L tl LlT^J!^!^l? , ?f!f.V!^ s P« cif L c _9i" ."""her o» the genomic sequence if the sequence resides in 



10 



20 



30 



a public databank. For each genomic sequence, the REF Tables indicate which regions are included in the MLS These 
reg,onscan,nclude^ 



Re9ion 1 ^gion 2 Region 3 



25 I 5 * UTR | Exon | ~\ Exon" 



•I Exon | 3' UTR | - 



^omcter , Intron Intron * 

Translational c . . 

Start Site St ° P Codon 



fs°snoLIetow EF ^ re P ortthefirstand "ast base of each region that are included in an MLSsequence. An example 

gi No. 47000: 
3S 37102... 37497 

37593 ... 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000' a first reqion 
including bases 37102-37497, and a second region including bases 37593-37925 9 

40 A. EXON SEQUENCES 

En!!^ 6 ^ 10 " ° f GXOnS ^ be determined b V compering the sequence of the regions from the genomic 
sequences with the corresponding MLS sequence as indicated by the REF Tables. 

45 i. INITIAL EXON 

[0020] To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 
50 (2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

t^MLSsL^crTL^ FirSt ' P ° Vf ! eptide s6ction wi " indicate w here the transiational start site is located in 
the MLS sequence. The MLS sequence can be matched to the genomic sequence that corresponds to the MLS Based 

be n«H fn WSen S POndin9 9en0miC SeqU6nCes ' the location of the translational star, site can 

re9 '° nS 98n ° miC SeqUenCa ThS l0Cati ° n ° f thiS transla,i °n a < ^ar, site is the start of 
[0021] Generally, the last base of the exon of the corresponding genomic region, in which the translational start site 



7 



10 



IS 



EP 1 033 405 A2 

was located, will represent the end of the initial exon. In some cases, the initial exon will end with a stop codon. when 

representing the MLS are in the positive strand of the corresponding gnomic 
sequence, the last base will be a larger number than the first base. When the sequences repre sentmg t heMLS "J " 
the negative strand of the corresponding genomic sequence, then the last base will be a smaller number than the f.rst 
base. 

IL INTERNAL EXONS 

[0023] Exceptions regio^^ 

regions that match the MLS sequence are the internal exons. Specifically, the bases def.mng the boundar.es of the 
remaining regions also define the intron/exon junctions of the internal exons. 

ill. TERMINAL EXON 

[0024] As with the initial exon, the location of the terminal exon is determined with information from the 



(1 ) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 
20 (3) the genomic sequence section 



25 



30 



35 



45 



of the REF Tables. The polypeptide section will indicate where the stop codon is located .n the MLS sequence^ The 
MLS sequence can be rnatched to the corresponding genomic sequence. Based on the ^tch b^een^ MLS and 
corresponding genomic sequences, the location of the stop codon can be determined .n one of the regions of the 
gen^c sequence. The location of this stop codon is the end of the terminal exon. GeneraHy, the first base of the exon 
oHhe corresponding genomic region that matches the cDNA sequence, in which the stop codon was located £ 
represent the beginning of the terminal exon. In some cases, the translation^ start s.te w.l. represent the start of the 
terminal exon, which will be the only exon. »u fl 
ro02S] In the case when the MLS sequences are in the positfce strand of the corresponding genomic sequence he 
last base will be a larger number than the first base. When the MLS sequences are .n the negative strand of the 
corresponding genomic sequence, then the last base will be a smaller number than the f.rst base. 



B. INTRON SEQUENCES 



[0026] In addition, the introns corresponding to the MLS are defined by identifying the genome ^quencel tocated 
between the regions where the genomic sequence comprises exons. Thus, mtrons are defined as starting on base 
downstream of a genomic region comprising an exon, and end one base upstream from a genomic region comprising 



an exon. 

40 C. PROMOTER SEQUENCES 



[00271 As indicated below, promoter sequences corresponding to the MLS are defined as sequences upstream of 
the first exon, more usually, as sequences upstream of the first of multiple transcription start sites; even more usually 
as sequences about 2,000 nucleotides upstream of the first of multiple transcription start sites. 



III. LINK Of cDNA SEQUENCES to CLONE IDs 



r0028] As noted above, the REF tables identify the cDNA clone(s) that relate to each MLS. The MLS sequence can 
be longer than the sequences included in the cDNA clones. In such a case, the REF table indicates the region oV he 
so MLS that is included in the clone. If either the 5" or 3' termini of the cDNA clone sequence is the same as the MLS 

sequence, no mention will be made. 

IV. Multiple Transcription Start Sites 

ss [0029] initiation of transcription can occur at a number of sites of the gene. The REF tables indie ate the possible 
multiple transcription sites for each gene. In the REF tab.es, the location of the transcription start s tes can . be erther 
a positive or negative number. The positions indicated by positive numbers refer to the transcription start sites as 
LSi the uls sequence. The negative numbers indicate the transcription start site within the genom.c sequence 
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20 



25 



30 



35 



40 



45 
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that corresponds to the MLS. 

[0030] To determine the location of the transcription start sites with the negative numbers the MLS sequence is 
aligned with the corresponding genomic sequence. In the instances when a public genomic sequence is referenced 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence indicated 
by the gf number shown in the public genomic DNA section of the REF tables. When the position is a negative number, 
the transcription start site is located in the corresponding genomic sequence upstream of the base that matches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the genomic sequence corresponding to the relevant gi" number. 

[0031] In the instances when no public genomic DNA is referenced, the relevant nucleotide sequence for alignment 
is the nucleotide sequence associated with the amino acid sequence designated by gi' number of the later PolvP SFO 
subsection. y 

V. Polypeptide Sequences 

[0032] The PolyP SEQ subsection lists SEQ I D NOs and Ceres SEQ I D NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translations start site with the coding sequence 
of the MLS sequence. ^ 

[0033] The MLS sequence can have multiple translational start sites and can be capable of producinq more than 
one polypeptide sequence. 

A. Signal Peptide 

[0034] The REF Tables also indicate in subsection (B) the cleavage site of the putative signal peptide of the polypep- 
tide corresponding to the coding sequence of the MLS sequence. Typically, signal peptide coding sequences comprise 
a sequence encoding the first residue of the polypeptide to the cleavage site residue. 

B. Domains 

[0035] Subsection (C) provides information regarding identified domains (where present) within the polypeptide and 
(where present) a name for the polypeptide domain. 

C. Related Polypeptides 

[0036] Subsection (Dp) provides (where present) information concerning amino acid sequences that are found to be 
related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi" number 

VI. Related Polynucleotide Sequences 

[0037] Subsection (Dn) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or corresponding genomic sequence. 



Abbreviation 


Description 


Max Len. Seq. 


Maximum Length Sequence 


rel to 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Genomic Sequence in cDNA (Each region for a single gene prediction is" 
listed on a separate line. 




In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNASEQ 


cDNA sequence 



so 
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(continued) 



Abbreviation 


Dp vrintion 


- Pat. Appln. bbU ID inU 


Patent AoDlication SEQ ID NO: 


nrn ir\ m/"v 1C7QG77 

- Ceres seq id n<j. id/oo/ / 


nprsq SEQ ID NO' 


- SEQ # W. TSS 


Location within the cDNA sequence, SEQ ID NO:, of Transcription Start Sites 
which are listed below 


- Clone ID #:#-># 


Clone ID comprises bases # to # of the cDNA Sequence 


PolyP SEQ 


Pok/npntiriA RpnuGnce 


- Pat. Appln. SEQ ID invJ. 


Patpnt Annlication SEQ ID NO: 


- Ceres SEQ ID NO 


pprpe QFO ID NO' 


- Loc. SEQ ID NO: @ nt. 


Location of translational start site in cDNA of SEQ ID NO: at nucleotide 
nurnuer 


(C) Pred. PP Nom. & Annot. 


Mnminatinn and Annotation of Domains within Predicted Polypeptide(s) 


- (Title) 


IN till Id \Jt LJ\J\ i lau i 


- Loc. SEQ ID NO #:#-># aa. 


i n^atinn nf thft domain within the polypeptide of SEQ ID NO: from # to # 
amino acid residues. 


(Dp) Ret. AA SEQ 


Related Amino Acid Sequences 


- Align. NO 


Alignment number 


- gi No 


Gi number 


- Desp. 


Description 


- % Idnt. 


Percent identity 


- Align. Len. 


Alignment Length 


- Loc. SEQ ID NO: # -> # aa 


Location within SEQ ID NO: from # to # amino acid residue. 



DETAILED DESCRIPTION OF THE INVENTION 

[0038] The invention relates to (I) polynucleotides and methods of use thereof, such 



IA. Probes, Primers and Substrates; 

IB. Methods of Detection and Isolation; 

B.1. Hybridization; 

B.2. Methods of Mapping; 

B.3. Southern Blotting; 

B.4. Isolating cDNAfrom Related Organisms; 
B.5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 



C.1. Antisense 

C.2. Ribozyme Constructs; 

C.3. Chimeraplasts; 

C.4 Co-Suppression; 

C.5. Transcriptional Silencing 

C.6. Other Methods to Inhibit Gene Expression 



ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 
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IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

[0039] The invention also relates to (II) polypeptides and proteins and methods of use thereof, such as 

IIA. Native Polypeptides and Proteins 

A.1 Antibodies 

A.2 In Vitro Applications 

IIB. Polypeptide Variants, Fragments and Fusions 



B.1 variants 
B.2 Fragments 
15 B.3 Fusions 



20 



[0040] The invention also includes (III) methods of modulating polypeptide production, such as 
I IIA. Suppression . 



A.I Antisense 
A.2 Ribozymes 
A. 3 Co-suppression 

A.4 Insertion of Sequences into the Gene to be Modulated 
25 A.5 Promoter Modulation 

A. 6 Expression of Genes containing Dominant-Negative Mutations 

IIIB. Enhanced Expression 

30 B.1 Insertion of an Exogenous Gene 

B. 2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such as 

55 IVA. Coding Sequences 

IVB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
40 V Transformation Techniques 

Definitions 

[0043] Allelic variant An allelic variant" is an alternative form of the same SDF, which resides at the same chro- 
mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
, reg.ons. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultrvarorecotype. An allelic variant may ormaynot give rise to a phenotypic change, and may or may not be expressed 

-x n , C3n r6SUlt in 3 detectable cnan 9 e in the Phenotype of the trait represented by the locus. A phenotypically 
so silent allele can give rise to a product. wiyfjiuaiiy 

- [0044] Alternatively spliced messages Within the context of the current invention, alternatively spliced messag- 
es refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons 
introns and/or intron-exon junctions. 

[0045] Chimeric The term chimeric" is used to describe genes, as defined supra, or contracts wherein at least 
two of the elements of the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive Promoter: Promoters referred to herein as "constitutive promoters" actively promote transcription 
undermost, but not necessarily all, environmental conditions and states of development or cell differentia Examples 
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of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcript initiation region and the V or 2" 
promoter derived from TDNA of Agrobacterium tumefaciens, and other transcription initiation reg.ons from vanous 
plant genes, such as the maize ubiquitin-1 promoter, known to those of skill. 

?00471 Coordinate^ Expressed: The term coordinated expressed." as used in the current invention, refers to 
s genes that are expressed at the same or a similar time and/or stage and/or under the same or sim.lar env.ronmental 

?W48]° n Domain: Domains are fingerprints or signatures that can be used to characterize protein families and/or 
parts of proteins. Such fingerprints or signatures can comprise conserved (1 ) primary sequence. (2) secondary ^s ruc- 
ture and/or (3) three-dimensional conformation. Generally, each domain has been associated with either a famiV of 
10 proteins or moltts. Typically, these families and/or motifs have been correlated with specific In- vH ^J£T3S 
tivities A domain can be any length, including the entirety of the sequence of a protein. Detailed descr.pt.ons of the 
domains, associated families and motifs, and correlated activities of the polypeptides of the instant .nvent.on are de- 
scribed below. Usually, the polypeptides with designated domain(s) can exhibit at least one activrty that is exh.brted by 
anv Dolvpeptide that comprises the same domain(s). ... . ■ 

« [0049] Endogenous The term endogenous," within the context of the current invent.on refers to any polynucle- 
otide polypeptide or protein sequence which is a natural part of a cell or organisms regenerated from sa.d cell. 
[0050] Exogenous Exogenous," as referred to within, is any polynucleotide, polypeptide or protem sequence, 
whether chimeric or not. that is initially or subsequently introduced into the genome of an indmdual host cell or the 
organism regenerated from said host cell by any means other than by a sexual cross. Examples of means by which 
this can be accomplished are described below, and include AgrobacteriummedMed transformat.on (of d.cots - a* 
Salomon et al EMBO J. 3:141 (19B4), Herrera-Estrella et al. EMBO J. 2:987 (1983); of monocots, representative 
papeTare those by Escudero e! al.. Plant J. 10:355 (1 996), Ishida et al., Nature Biotechnology .£745 ( 996), May 
etal BiorTechnotogy^.ASB (1 995)). biolistic methods (Armaleo et al.. Current Genetics 17:97 1990)). electroporafon, 
in plants techniques, and the like. Such a plant containing the exogenous nucleic acid is referred to here as a T 0 for 
the primary transgenic plant and T, for the first generation. The term exogenous" as used here.n is also .ntended to 
encompass inserting a naturally found element into a non-naturally found locat.on. 

r0051] Filler sequence: As used herein, filler sequence" refers to any nucleot.de sequence that .s .nserted .nto 
DNA construct to evoke a particular spacing between particular components such as a promoter and a coding regon 
and may provide an additional attribute such as a restriction enzyme site. 

/[00S2] Gene: The term gene," as used in the context of the current invention, encompasses ^latory and coding 
J sequence contiguously associated with a single hereditary unit with a genetic function (see SCHEMATIC )^nes 
can include non-coding sequences that modulate the genetic function that include, but are not limited to those tha 
specify polyadenylation, transcriptional regulation, DNA conformation, chromatin conformation, extent and position of 
base mediation and binding sites of proteins that control all of these. Genes comprised of exons (coding sequence £ 
35 which may be interrupted by introns" (non-coding sequences), encode proteins. A gene's genet.c function may require 
only RNA expression or protein production, or may only require binding of proteins and/or nucle.c ac.ds wrthout asso- 
ciated expression. In certain cases, genes adjacent to one another may share sequence in such a way that one gene 
will overlap the other. A gene can be found within the genome of an organism, artificial chromosome, plasm.d, vector, 

/etc., or as a separate isolated entity. - 

J [0053] Gene Family: Gene family" is used in the current invention to describe a group of functionally related genes, 
each of which encodes a separate protein. 

[00541 Heterologous sequences: Heterologous sequences" are those that are not operatrvely hnked or are not 
contiguous to each other in nature. For example, a promoter from corn is considered heterologous to n Ambtdopss 
coding region sequence. Also, a promoter from a gene encoding a growth factor from com is considered "eterolopu. 
to a sequence encoding the corn receptor for the growth factor. Regulatory element sequences, such as UTRs or 3 
end termination sequences that do not originate in nature from the same gene as the coding sequence or.g.nates from 
are considered heterologous to said coding sequence. Elements operatively linked in nature and contiguous to each 
other are not heterologous to each other. On the other hand, these same elements remain operativley linked but become 
heterologous if other filler sequence is placed between them. Thus, the promoter and coding sequences of a corn gene 
expressing an amino acid transporter are not heterologous to each other, but the promoter and coding sequence of a 
corn gene operatively linked in a novel manner are heterologous. ■ eenllfinrp 

[0055] Homologous gene In the current invention, homologous gene" refers to a flen e ^_* h _^ 8 ^^ 
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^ Sity™^ i'n'oniy'a fragment of the sequence and often represents £ 

functional domain such as, examples including without limitation a DNA binding doma.n. a doma.n wrth tyrosine kmase 
55 activity or the like. The functional activities of homologous genes are not necessarily the same. 

[0056] Inducible Promoter An inducible promoter" in the context of the current invention refers to a promoter 
which is regulated under certain conditions, such as light, chemical concentration, protein <™*«^^*™™ 
an organism cell, or organelle, etc. A typical example of an inducible promoter, which can be utilized wrth the polynu- 
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deot.des of the present invention, is PARSK1 , the promoter from the Arabidopsis gene encoding a serine-threonine 
man 2 IT by deh V^ation, abscissic acid and sodium chloride (Wang and GoT 

?n2fJZl J£ ( 19 " Examples ° f environmental conditions that may affect transcription by inducible promoters 
include anaerobic conditions, elevated temperature, or the presence of light 

s [0057] Intergenic region Intergenic region,' as used in the current invention, refers to nucleotide sequence oc- 
curring in the genome that separates adjacent genes. sequence oc 
[0058] Mutant gene In the current invention, mutant" refers to a heritable change in DNA sequence at a specific 
STtra^cribed inV8n,i ° n "** " ^ "* " aSS °° ia,ed ident * ab,e function whenthe mSt 

w [0059] Orthologous Gene In the current invention orthologous gene" refers to a second gene that encodes a 
gene product that performs a similar function as the product of a first gene. The orthologous gene ma^ Save a 
degree of sequence similarity to the firs. gene. The orthologous gene may encode a polypeptide that exhibits a degree 
l,Zn!?H e S,m " arlty 1 t ° 3 Po'yP^e corresponding to a first gene. The sequence similarity can be found within a 

« tides ^ ent ' re ' en9th °' C ° din9 S6qUenCe ° f the 96neS and/ ° r ,h6ir corres P°"ding poTypep 

* y^lr, P 7 centa , 9e °| se , quence iden,i,v "Percentage of sequence identity," as used herein, is determined by 
v comparing two opt.maHy aligned sequences over a comparison window, where the fragment of the polynucleit.de or 
ammo acid sequence in the comparison window may comprise additions or deletions (e.g., gaps or overhanqs) as 

™nl ,0 Ti e re,ere "f e SeqUenCe (WWCh d0SS n0t COmprise addi,ions or deletions ) for °P tima ' Alignment of the two 
sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acS 

2 f ™° k T° f T^" ° CCUrS in b ° ,h sequences to V ield the "umber of matched positions, dividing the numbed 

to v^hf EJ T V T t0ta ' nUm . bSr °' P ° Si,i0nS ,hS ° f COmparison and »• Result by 100 

th/Sl ho P ? 6 °L Seq " e : ce ,den,it * ° ptimal ali 9"™nt <* sequences for comparison may be conducted by 
the local homology algorithm of Smith and Waterman Add. APL. Math. 2:482 (1 981 ), by the homology alionment al 

BECTFIT ^ast pI?™ i < 2XK 2A4 H ? 88) ' by computer * ed implementations of these algorithms (GAR 
rGCGr 57 5s^L P Sf m' Tm T/ V n thS WiSC ° nSin Geneti ° S S ° ftware Packa 9 e ' Genetics c ° m P^ Group 
sap ^ ' f °' ,nspec,ion ' Given thattwosequences have been identified for comparison 

GAP and BESTFIT are preferably employed to determine their optimal alignment. Typically, the default values of 5 00 

or polypeptide sequences refers to polynucleotide or polypeptide comprising a sequence that has at least 80% se- 
quence , dent, y, preferably at least 85%, more preferably at least 90% and most preferably at least 95% even more 

SSTSiT T"' 97% ' f T° ° f " % SeqUenCe id9n,ity COmpar6d 10 3 ^-nce sequence using tt programs 
E, t T A , P 'f nt Pr0m0ter " is a promoter capable of initia < in 9 transcription in plant cells and can 

dnye or fac.lrtate transcr.pt.on of a fragment of the SDF of the instant invention or a coding sequence of the SDF of the 

STc^SCS; FT* ne8d be ° f Plant ° ri9in - FOf eXamp ' e ' Pr0m0tere plant viruses, si 

a twntf. ? , 6r ° r ° m A 9 robacterium tumefaciens such as the T-DNA promoters, can be plant promoters 

£52? 8 ° f 3 Plan J Pr0m ° ,er ° f P,ant ° ri9in iS ,he maiZe Ubic " urtin - 1 < ubi " 1 > pro ™'*' kncwn to those oTskT 

0062] Promoter: The term "promoter.' as used herein, refers to a region of sequence determinants located 
upstream from the start of transcription of a gene and which are invoked in recognition and binding of RNA polymerase 
and other proteins to inflate and modulate transcription. A basal promoter is the minimal sequence neceLry Z 
assembly of a transcnpt.on complex required for transcription initiation. Basal promoters frequently include a TATA 
nTln \T Bn ! USUal \ toca,ed between 15 and 35 nucleotides upstream from the site of initiation of transcription. Basal 
promoters also somet.mes .nclude a CCAAT box" element (typically a sequence CCAAT) and/or a GGGCG sequence 

nrlTllT aXeti b6tWeeh 40 3nd 200 nUCle ° ,ideS ' Pre,erab ^ 60 to 120 nuc,eo,ides . from the staTsTe oi 

• l ran script ton, 

[0063] Public sequence: The term public sequence ," as used in the context of the instant application refers to 
Z nnn "ST *>oen deposited in a publichy accessible database. This term encompasses both am^ Ldd 

NCR? ftp Ik S f q , UenCe \f uch ^quences are publicly accessible, for example, on the BLAST databases on the 
h U mpp! < ac cess.ble at ncbmlm.gov/blast). The database at the NCBI GTP site utilizes gi" numbers assigned 

f 3 Un ' qUe , L BT f ° r SaCh S6qUenCe " the databases . thereb y Priding a non-redundant database for 
pS" Data BaT)° US S ' inC ' Udin9 GenBank ' EMBL ' ° BBJ ' ( ° NA Da,ab3Se ° f Jap3n) and PDB ( Braok h a ^n 

[0064] Regulatory Sequence The term regulatory sequence," as used in the current invention, refers to any 
nucleot.de sequence that influences transcription or translation initiation and rate, and stability and/or mobility of the 
!7™tc ° r P0lypept i de product - R«9"' a tcy sequences include, but are not limited to, promoters, promoter control 
elements, prote.n b.ndmg sequences, 5' and 3' UTRs, transcriptional start site, termination sequence, polyadenylation 
sequence, introns, certain sequences within a coding sequence, etc. 
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rboesi Related Sequences: Related sequences" refer to either.a polypeptide or a nucleotide sequence that 
exhibits some degree of sequence similarity with a sequence described by the REF and SEQ tables. 
10066] Scaffold Attachment Region (SAR) As used herein, scaffold attachment region" is a DMA sequence ithat 
anchors chromatin to the nuclear matrix or scaffold to generate loop domains that can have either a transcnpt.onally 
active or inactive structure (Spiker and Thompson (1996) Plant Physiol. 110: 15-21). 

SI?! Sequence-determined DNA fragments (SDFs) Sequence-determined DNA fragments" as used in the 
current invention are isolated sequences of genes, fragments of genes, intergenic regions or contiguous DNA from 
plant genomic DNA or cDNA or RNA the sequence of which has been determined. _ , arno , c 

rO0681 Signal Peptide A signal peptide" as used in the current invention is an ammo ac.d sequence that targets 
the protein for secretion, lor transport to an intracellular compartment or organelle or lor incorporation into a membrane. 
Signal peptides are indicated in the tables and a more detailed description located below 

r00691 Specific Promoter In the context of the current invention, specific promoters" refers to a subset of induc- 
ble promoters that have a high preference for being induced in a specific tissue or cell f^^ a8 ^ c J^ u /^ 
development of an organism. By high preference" is meant at least 3-fold, preferably 5-fold more pre erabV at Jeast 
10-fold still more preferably at least 20-fold, 50-fold or 100-fo.d increase in transcription ,n the des. ed t.ssue oyer the 
transcription in any other tissue. Typical examples of temporal and/or tissue specific promoters of plant .ongm that can 
be used' with the polynucleottoes of the present invents, are: PTA29, a promoter which .s capable of drmn g gene 
transcription specifically in tapetum and only during anther development (Koltonow et a .. Plant ^' 1 1201 g° ; 
RCC2 and RCc3, promoters that direct root-specific gene transcription in rice (Xu et al.. Plant Mol. B,ol |7.237 (1995 

20 S^yaroot^ 

promoter; under developmental control include promoters that initiate transcription only in certa,n tissues or organs 
?uch as root ovule fruit, seeds, or flowers. Other suitable promoters include those from genes encoding storage 
l, proteins or the lipid body membrane protein, oleosin. A few root-specific promoters are noted above. 
n ^[00701 Stringency "Stringency" as used herein is a function of probe length, probe composrtion G + C content), 
n 25 W and I salt concentratL, organic solvent concentration, and temperature of hybridization or wash conditions. Stringency 
fs typfcaSy compared by L parameter T m , which is the temperature at which 50% of the complementary molecules 
in the hybridization are hybridized, in terms of a temperature differential from T m . High stringency condrtions are those 
providing a condition of T m - 5'C to T m - 10'C. Medium or moderate stringency condftions are those ^Jji " 
20"CtoT -29°C Low stringency conditions are those providing a condition of Tm-^'C to T m - 48 C.Therelationsnip 
30 of hybridization conditions to T m (in "C) is expressed in the mathematical equation 
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T = 81.5 -16.6(log 10 [Na + ]) + 0.41(%G+C) - (600/N) 



(D 



35 where N is the length of the probe. This equation works well for probes 1 4 to 70 nucleotides in length that are identical 
to the target sequence. The equation below for T m of DNA-DNA hybrids is usefulfor probes in the range of 50 to greater 
than 500 nucleotides, and for conditions that include an organic solvent (formamide). 



40 



T = 81.5+16.6 log {[Na + ]/(1 +0.7[Na + ])}+ 0.41(%G+C)-500/L 0.63(%formamide) 



(2) 



where L is the length of the probe in the hybrid. (P. Tijessen, Hybridization wrth Nucleic Acid Probes ^°ff°?Y 
T^hninues in Biocher^trv ,nd Molecular Biology. P.C. vand der Vliet, ed., c. 1 993 by Elsev.er, Amsterdam The T 
of equation (2) is affected by the nature of the hybr id; for DNA-RNA hybrids T m is 10-15'C higher than calculated for 

45 rnTr^A hyb ids T m is-20-25°C higher. Because the T m decreases about 1 °C for each 1% decrease ,n homology 
when a^probet used (Bonner et aL.J. Mo,. Bid. fil:128 (1973)), stringency conditions can be adjusted to favor 
detection of identical qenes or related family members. 4 . *• 

?0071 Eq ation (2) is derK/ed assuming equilibrium and therefore, hybridizations according to the present mventon 
are most preferably performed under conditions of probe excess and for sufficient time to achieve equilibrium. The 

so timeTequirad to reacr, equilibrium can be shortened by inclusion of a hybridization accelerator such as dextran sulfate 
or another high volume polymer in the hybridization buffer. rfh „ ollarinn 
[ O 072] Stringency can be controlled during the hybridization reaction or after hybridization has purred by altering 
the salt and temperature conditions of the wash solutions used. The formulas shown above are equally valid when 
used* compute the stringency of a wash solution. Preferred wash solution stringencies lie withm the ranges stated 

55 Tove h^Egency is 5-8'C below T m . medium or moderate stringency is 26-29°C below T ra and low stringency ,s 

?o 5 oSi C SublnTially free of A composition containing A is substantially free of B when at least 85% by weight 
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of the total A+B ,n the composition « A. Preferably, A comprises at least about 90% by weight of the total of A+B in 
the composition, more preferably at least about 95% or even 99% by weight. For example a plant gene o a DMA 
sequence can be considered substantially free of other plant genes or DNA sequences gene or a una 

5 SnMA t nslationa ' star1 site ,n the ^xt of the current invention, a translational start site" is usually an ATG 

2 T ,ranS T P ' m0rS USUa " y lhS fifSt ATG - A Sin9,e cDNA ' however ' ma V have ™"iP'* translational start sfteT 
[0075] Transcr.pt.on start srte Transcription start site- is used in the current invention to describe the po^t at 

st VuTZTZTtITT* ^ PO,nt iS ,yPiCa " y IOCa ' ed ab ° Ut 25 nuc,eo,ides downstream from a TFIID b^ing 
Si ? J . OX ' Transcnp,,on can initiate at ° ne or ™ r * within the gene, and a single gene may have 
10. /W« n 7 CnP T?' Start Sit6S ' SOme ° f Which may be specific for inscription m a particular cell-type orTssue 
^ Thi" 91 ? (U I?, A UTR " " any C ° nti9UOUS Seri6S ° f "^otide bases that is ZZT ti buX 

1-ln! Z wn a u , ntranslated re 9ons may be associated with particular functions such as increasing mRNA 
message stability. Examples of UTRs include, but are not limited to polyadenylation signals, terminations sequences 
sequences located between the transcriptiona. star, srte and the first exon (5 UTR) and sequencesSteTbetween 
the last exon and the end of the mRNA (3" UTR). oeiween 

15 ES- ^ riant ' ^ Variant ° iS US6d hSrein t0 den ° te 3 P Q| ypeptide or protein or polynucleotide molecule 
n »m n« T S ° f 1,8 tand in SOmS F ° r eX3mple ' P^PePtWe and protein variants can consist of Thanges 
in ammo acd sequence and/or charge and/or post-translational modifications (such as glycosylate etc) 
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DETAILED DESCRIPTION OF THE INVENTION 
I. Polynucleotides 



• Ex T pUmi SDFs of the invention ^present fragments of the genome of corn, wheat, rice, soybean orAm- 
and/or represent mRNA expressed from that genome. The Elated nuclei: acid of the invention ateo encom- 
passes corresponding fragments of the genome and/or cDNA complement of other organisms as described in detail 

[0079] Polynucleotides of the invention can be isolated from polynucleotide libraries using primers comprisinq se- 
e«Ts e u7a ° SCribedby ^ 

mTis ° f inVen,l '° n Ca " b6 Pr0dUCed by ChemiCal synthesis 

2S? Thl? o ( L 0n,emPlated ? h3t ,h ! nucleotide sequences presented herein may contain some small percentage of 
ITLlr^Z Z ^ n0rmal C ° UrSe ° f determinati ° n o' ""='eolide sequences/Sequence errors can 

be corrected I by obta.n.ng seeds deposited under the accession numbers cited above, propagating them isolating 
genom.c DNA or appropriate mRNA from the resuming plants or seeds thereof, amplifying the refevaS fragment of the 
amXatrprS NA ^ ^ * «"* flankS the -neol/sequence. and seqZcTJ Z 

I.A. Probes. Primers and Substrates 

US? f S ? k S °' th6 inVenti ° n ° an be applied t0 subs,rates for "se in array applications such as. but not limited to 
3 !n 1,!?k 8 . 9en ! eXPreSSi ° n ' f ° r eXamp ' e Und6r Varying condili ons of development, growth conditions. The arrays 
can also be used in diagnostic or forensic methods (WO95/35505, US 5,445.943 and US 5 41 0 270) 

1°?^^^ Airj%°; the ,i nStem hVbridiZe 10 3 uprising a sequence in REF 

and SEQ TABLES 1 AND 2. Though many different nucleotide sequences can encode an amino acid sequence the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptide of the invention 

STSSTSE?^ T and/ ° r °' inStant inV8ntfon nSed to thoJe!n R E F ™d 

fenSh hh. , f C0 ' T1plemen,s ,hereof - For exam P'* ^me variation in probe or primer sequence and/or 

length can allow addrtional family members to be detected, as well as orthologous genes and more texonomicallv 
d,stant rela ed sequences. Similarly, probes and/or primers of the invention can include additional niSS^S 
serve as a label for detecting the formed duplex or for subsequent cloning purposes nucleotides that 

^OSifS* "7*5. Wi ", Vary dependino on the ^Plication. For use as primers, probes are 12-40 nucleotides, pref- 
erably 18-30 nucleotides long. For use in mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotides tong. For Southern hybridizations, probes as ,ong as severa. kilobases can be used as' explained Slow 
« a | / rn « °^oTfI S f 8 " be Pr ° dUCed by Syn,he,iC P rocedur es such as the triester method of Matteucci 

ILt^i ^Tk, — :3185( 1981)1 ° f aCC ° rdin9 ,0 Urdea 61 aL Proc - NatL A <** 80:7461 (1981) or using 
commercially available automated oligonucleotide synthesizers. ~ 
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I B. Methods of Detection and Isolation 



[0086] The polynucleotides of the invention can be utilized in a number of methods known to those skilled .n the art 
as probes anS/br primers to isolate and detect polynucleotides, including, without hmrtation: Sou herns^ Northerns. 
Branched DNA hybridization assays, polymerase chain reaction, and microarray assays, and variations thereof. Spe- 
cific methods given by way of examples, and discussed below include: 

Hybridization 
Methods of Mapping 
Southern Blotting 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes. 

Also the nucleic acid molecules of the invention can used in other methods, such as high densi * 

IS; WO mS° WO mIU; WO 9915658; WO 9906572; WO 9B58052; WO 9958672; and WO 9810858. 

B.I. Hybridization 

r00871 The isolated SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be used as probes and/ 
Sarrfor defection and/or isolation of related polynucleotide sequences through hybridiza tan • Hybnd,zat^n of 
LLleicacidtoanotherconstitutesaphy^ 

Zted sequences. Also, such hybridization imposes structural limitations on the pair A good general discuss.on of 
the actors for determining hybridation conditions is provided by Sambrook et al. ("Molecular Cloning a Laboratory 
ManS 2nd ed c. 1989 by Cold Spring Harbor Laborato* Press Cold Spring Harbor, NY; see esp chapters 1 and 
12) Additional Consideration, and details of the physical chemistry of hybridization are prov.ded by G.H. Keller and 
M M Manak DNA Probes', 2"<» Ed. pp. 1 -25, c. 1 993 by Stockton Press, New York, NY. 

roOMl Depending on the stringency of the conditions under which these probes and/or primers are used, polynucle- 
S exhE a wSe range of similarly to those in REF and SEQ TABLES 1 AND 2 can be detected or isolated. 
When the practitioner wishes to examine the result of membrane hybridizations under a variety of str ingencies an 
ZSiSl^SSm^ perform the hybridization under a low stringency condition, then to wash the hybridation 

^^^tV^^s genes in other specie, the practitioner wi„ P refe ra b,y adjust the 
amount of target DNA of each species so that, as nearly as is practical, the same number of genome equivalents arc 
Tesem for each species examined. This prevents faint signals from species havmg large 

niKnbers of genome equivalents per mass of DNA, from erroneously being interpreted as absence of the corresponding 

S^T '^eprobTs and/or primers of the instant invention can also be used to detect or isolate nucleolus that are 
KLrE ?hCobeso7primers. Two nucleic acid sequences or polypeptides are said to be -identical-, f the sequence 
SSoJ^^Ino add residues, respectively, in the two sequences is the same when aligned for max.mum 

presented in REF and SEQ TABLES 1 AND 2. The probes and/or primers of the .nvent.on ™^ * u £!^£ 
and/or isolate polynucleotides exhibiting at least 80% sequence identity wrth the sequences of REF and SEQ TABLES 

io^] 2 ZSZP tonuc°Lide sequences, degeneracy of the genetic code proves the possibility to substitute 
at toast one base of the base sequence of a gene with a different base without causing the ammo acd sequence of 
The poioeptide produced from the gene to be changed. Hence, the DNA of the present invention may also have any 
base' thafhas been changed from a sequence in REF and SEQ TABLES t AND 2 ^^"lo/TS 
ance with degeneracy of genetic code. References describing codon usage include: Carels et al., J. Mol. Evol. 46. 45 
(1998) and Fennoy et al., Nucl. Acids Res. 21(23): 5294 (1993). 

B.2. Mapping 

rot»931 The isolated SDF DNA of the invention can be used to create various types of genetic and physical maps of 
Sfaenome TcoTArabidopsis. soybean, rice, wheat, or other plants. Some SDFs may be absolute* associated 
Zm^al^oWc traSs, allowing construction of gross genetic maps. While not all SDFs wi.l immediately be 
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» 34 !L I* 19 suspension cul,ure cells are transformed with exogenous DNA as described by Z Chen et at Plant Mot 
2* f i f 9 £h 98) - , Briefly ' ^ P° s '- subculture «* are incubated with cell wall digestbn sS^ o tSX 
h , h ' ? «f Se> MES ( 2 -t N - Mor P ho "n°] ethanesulfonic acid) pH 5.0 for 5 hours. The digested c^s are 

IT^T Tn "T- ^ W3Shed ,WiCS in W5 S0lU,iOn COn,ainin 9 1 54 mM NaCI - 5 mM KCU 25 m M Cacu 
and 5mM glucose. pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES 20 mM CaCI 0 5 

mannlto| . PH 5-7 and the protoplast density is adjusted to about 4 x 106 protoplasts peTml 
[2348] 15-60 ug of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40<y 
52 ::?^' W 8 ° 00 " PEG 80 °°>- * 9ent,e inversion a few times a, room temperature fo 5 to S £f 
in T^IZZ T ed, r^T n *" 3rt iS add6d int ° the p EG-DNA-proto P last mixture. Protoplasts are incubaTd 
in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient express on of he 

SUSS A ' tem f ?' V ranSf ° rmed Ce " S 030 be USed tG P roduce trans 9*™ callus, which in turn ca *e led 

SBB491 (1985), /cfe^car,™ anc , /sofeton cg/nofe Cells that Produce Somatic Embryos in Carrot Suspension cUl 

tbnf of th? m l n tZ! i0n b H in9 l hU , S * Wi " bS apparent t0 ° ne 0f 0rdinarv ski » in ,he art various modifica- 

tions of the matenals and methods for practicing the invention can be made. Such modifications are to be considered 
within the scope of the invention as defined by the following claims considered 



Claims 



1 . An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which encodes an amino 
ac,d sequence exh.brt.ng at least 40% sequence identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

2 ' S^JZZ^Z*™* C ° mPriSin9 3 nUC ' eiC 3Cid haV,n9 8 nUCle ° tide S6qUenCe WWCh 6XhibitS - leas < 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a'fragment thereof. 

3 ' ^ S !!f^ d nUCl f ?? d . m ° leCUle ^P™"* a nucleic acid havi "9 a nucleotide sequence which exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a'fragment thereof. 

4. An isolated nucleic acid molecule which is the reverse of the isolated nucleotide sequence according to any one 
of cla,ms 1 -3. such that the reverse nucleotide sequence has a sequence order which isthe reverse of the sequence 
order of sa.d isolated nucleotide sequence according to any one of claims 1 -3. sequence 

5. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid havino a 
sequence selected from the group consisting of: "uoi Bl c acia naving a 

(a) a nucleotide sequence which is shown in REF and/or SEQ Table 1 or 2- and 

(b^a nucleotide sequence which is complementary to a nucleotide sequence shown in REF and/br SEQ Table 

under conditions that permit formation of a nucleic acid duplex at a temperature from about 40°C and 48°C below 
the melting temperature of the nucleic acid duplex. 

6. The nucleic acid molecule according to any one of claims 1 -5, wherein said nucleic acid comprises an open reading 
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7 The isolated nucleic acid molecule of any one of claims 1 -5, wherein said nucleic acid is capable of functioning as 
' a promoter, a 3' end termination sequence, an untranslated region (UTR). or as a regulatory sequence. 

8. The isolated nucleic acid molecule of claim 7, wherein said nucleic acid is a promoter ^W ri ^ a^uence 
selected from the group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or any 
transcriptoin-factor binding sequence, and any combination thereof. 

9 The isolated nucleic acid molecule of claim 7, wherein the nucleic acid sequence is a regulatory sequence which 
is capable of promoting seed-specific expression, embryo-specific expression, ovule-specrfic expression, tapetum- 
w specific expression or root-specific expression of a sequence or any combination thereof. 

10. A vector construct comprising a nucleic acid molecule according to any one of claims 1-9, wherein said nucleic 
acid molecule is heterologous to any element in said vector construct. 

is 11. A vector construct according to claim 10 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription and/or translation; and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid molecule according to any one of 
claims 1-4; 

wherein said first and second nucleic acids are operably linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 11 , wherein said first nucleic acid is native to said second nucleic acid. 

13. The vector construct according to claim 11 , wherein said first nucleic acid is heterologous to said second nucleic 

acid. 

1 4. A vector construct according to claim 1 0 comprising: 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
7; and 

(d) a second nucleic acid; 

35 wherein said first and second nucleic acids are operably linked and wherein said first nucleic acid is heterologous 

to any element in said vector construct. 

15. The vector construct according to claim 14, wherein said first nucleic acid is native to said second nucleic acid. 

40 16. The vector construct according to claim 14, wherein said first nucleic acid is heterologous to said second nucleic 

acid. 

17. A host cell comprising an isolated nucleic acid molecule according to any one of claims 1 -4, wherein said nucleic 
acid molecule is flanked by exogenous sequence. 

45 

18. A host cell comprising a vector construct of any one of claims 10-16. 

19. An isolated polypeptide comprising an amino acid sequence 

so (a) exhibiting at least 40% sequence identity of an amino acid sequence encoded by a sequence shown in 

REF and/or SEQ Table 1 or 2 or a fragment thereof; and . . Wfl 

(b) capable of exhibiting at least one of the biological activities of the polypeptide encoded by sa.d nucleotide 
seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

55 20 The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 75% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

21. The isolated polypeptide of claim 19, wherein said amino acid sequence exhibits at least 85% sequence identity 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

22. The isolated polypeptide of claim 19. wherein said amino acid sequence exhibits at least 90% sequence identity 
to an ammo ac.d sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

23. An antibody capable of binding the isolated polypeptide of any one of claims 1 9-22. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1-4- and 

(b) contacting said isolated nucleic with said host cell under conditions that permit insertion of said nucleic 
acid into said host cell. 

25 ' any onToflSTIe 9 * ^ C ° mpriSeS contactin 9 a host cell with a vector construct according to 

26. A method of modulating transcription and/or translation of a nucleic acid in a host cell comprising: 

(a) providing the host cell of claim 24 or 25; and 

(b) culturing said host cell under conditions that permit transcription or translation. 

27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1 -5- 

(b) contacting said isolated nucleic acid molecule with a sample under conditions which permit a comparison 
of the sequence of sad isolated nucleic acid molecule with the sequence of DNA in said sample- and 

(c) analyzing the result of said comparison. 

28 ' ^T^iT^S t0 C ' a ! m . 27 ; Wher8in S3id iS °' a,ed nudeic acid mo,ecule and said «nple are contacted 
under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which is 
exogenous to said plant or plant cell. n ls 

30. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1 -4 wherein said 
nucleic acid molecule is heterologous to said plant or said cell of a plant. 

31 - A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1 -4. 

32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been transformed with a vector construct according to any one of claims 1 0-1 6. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 
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